THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 


VOLUME 29, NUMBER 2 


Masking of Speech by Line-Spectrum Interference* 


J. C. R. Lickrwer AND NEWMAN GUTTMAN 
Massachusetts Institute of Technology, Cambridge, Massachusetts 


(Received October 15, 1956) 


Two series of intelligibility tests were conducted. In the tests, speech was presented against a background 
of interference. The line-spectrum interference consisted of from 4 to 256 sinusoids, superposed in a linear 
adder. Three different spacings of the components in frequency, and several different distributions of power 
among the components, were studied. Tests with continuous-spectrum random noise were made for 
comparison. 

The over-all interference power required to reduce intelligibility to a given level decreases as the number 
of components in the interference increase. The drop is about 10 db in the decade from 4 to 40 components. 
Beyond 40, there is much less change. Even 256 components, however, mask measurably less effectively than 
random noise of equivalent power in the band 200-6100 cps. For a given number of components, the line- 
spectrum interference most detrimental to intelligibility has the same number of components in each 
frequency band of equal contribution to intelligibility, and its components are uniform in amplitude. The 
bearing of these findings on the theory of intelligibility and on procedures for predicting intelligibility from 


FEBRUARY, 1957 


physical parameters is discussed. 


PROBLEM 


HE experiments to be described were formulated 

with reference to a theoretical question: How well 

does the theory of intelligibility that stems from the 

critical-band hypothesis! and the importance-function 

hypothesis? account for the masking of speech by 

interference consisting of discrete line components of 
known frequencies and amplitudes ?* 


METHOD 


The basic method of the study involved preparing 
samples of speech and samples of interference on mag- 
netic tape, playing them back through a linear adding 
circuit via headsets to listeners, and determining from 
the listeners’ responses measures of intelligibility. 

The speech samples were lists (Harvard “‘PB”’ Lists) 
of monosyllabic words recorded in a small anechoic 
room by one talker (JCRL) with his mouth 12 inches 
from a Western Electric 640-AA condenser microphone, 
normal incidence. Each test word was imbedded in the 
carrier sentence, “You will write now,” and the 
talker tried consistently not to give the test words 
unusual emphasis, to stretch them out, or otherwise to 
make them more intelligible than they would be in 
normal context. The carrier sentence was monitored 


* This is Air Force Cambridge Research Center Technical Note 
TN-56-66, ASTIA Document No, AD-98831. The work was sup- 
ported under Air Force Contract AF 18(600)-1219, monitored by 
the Operational Applications Laboratory, AFCRC. 

1H. Fletcher and W. A. Munson, J. Acoust. Soc. Am. 9, 1-10 
(1937); H. Fletcher, Revs. Modern Phys. 12, 47-65 (1940); 
Schafer, Gales, Shewmaker, and Thompson, J. Acoust. Soc. Am. 
22, 490-496 (1950); R. C. Bilger and I. J. Hirsh, J. Acoust. Soc. 
Am. 28, 623-630 (1956). 

2N. R. French and J. C. Steinberg, J. Acoust. Soc. Am. 19, 
90-119 (1947); L. L. Beranek, Proc. Inst. Radio Engrs. 35, 
880-890 (1947); H. Fletcher and R. H. Galt, J. Acoust. Soc. Am. 
22, 89-151 (1950); K. D. Kryter, J. Speech and Hearing Dis- 
orders 21, 208-217 (1956). 

3 The interesting problem of the effect of the phase pattern of 
the interference was not investigated. The phase angles were 
random. 


with the aid of a volume indicator (VU meter) at a level 
of about —10 db ve 1.0 microbar, average of the peak 
needle deflections corrected to 1.0 meter. 

The interferences were of two types: (1) those of 
direct interest, consisting of various numbers of super- 
posed sinusoids, and (2) those used for purposes of 
comparison, consisting of random noises with uniform 
or shaped spectra. The tapes of the first type were 
prepared by adding the outputs of up to 16 oscillators 
and, when necessary, signals played back from pre- 
viously recorded tapes. The arrangement of apparatus 
for recording the line-spectrum interferences is shown in 
block diagram in Fig. 1. The random-noise interference 
was provided by a gas-tube ionization noise generator. 

Since the phase angles of the components are random 


OSCILLATOR 
OSCILLATOR 


OSCILLATOR 


MAGNETIC 
TAPE 
RECORDER 


COUNTER 
VOLT METER 
OSCILLOSCOPE 


ewe wR me wm eee a i me et em em How we eee ee ee ww ee” 


7 
' 
' 
' 
1 
I 


OSCILLATOR 


MAGNE TIC 
TAPE 
PLAYBACK 


ee 


Fic. 1. Apparatus used in preparation of line-spectrum tapes: 
For 2 < 16, the outputs of x commercial oscillators were superposed 
in the multi-input linear adder. For »>16, the outputs of 16 
oscillators and of one channel of an Ampex 3273 magnetic tape 
machine (7% ips) were superposed in the adder. The resulting 
signal was recorded on one channel of an Ampex 3473 machine 
(7% ips). A Berkeley 5571 EPUT meter, a Ballantine 300 volt- 
meter, and a DuMont 304-AR oscilloscope made it possible to 
adjust the frequencies one at a time within +3 cps of desired 
values and then to set the levels so that, on playback during the 
listening tests, the components would lie within a range of 2 db. 


287 


288 J. 


or haphazard and will therefore not be indicated, each 
interference is defined by a cumulative power spectrum 
(f). For the line spectra, &(f) rises in steps from 0 at 
zero frequency to w (the total power) at and beyond the 
frequency of the highest component. Each step corre- 
sponds to an individual component and has a height 
equal to its power $(f). 


Sf 
a(f)= f sto)ay. (1) 


It is convenient, however, to break the specification 
down into parts: (1) the total power w, (2) the number 
nm of components, (3) a smooth function W(f) of fre- 
quency governing the relative density, or proximity to 
one another along the frequency scale, of the compo- 
nents, and (4) another smooth function O(f) of fre- 
quency governing the relative powers of the compo- 
nents. We may think of holding w, V(f), and O(f) 
constant and increasing ~ from some small number to 
approach infinity. We then have more and more 
components, closer and closer together, and compen- 
satingly weaker and weaker. They provide increasingly 
good approximations to a random noise, the spectrum 
of which has been shaped by a filter. 

To relate the second specification to the first, we have 
for n>, 


f 
a(f\~ f U(v)@(v)dv. (2) 


And, for any sizable n, if we average over a large 
enough interval of frequency, 


O(f)) w= (%(f)O(f))a. (3) 


In preparing the interferences, we recorded various 
numbers of sinusoids with the three different line- 


INTEGRALS OF: 
- EXPONENTIAL 
IMPORTANCE 
MEL 


le] 1000 2000 3000 4000 5000 6000 
FREQUENCY f IN CYCLES PER SECOND 


Fic. 2. The solid curves show, for the three line-density func- 
tions used in the tests, the fraction of the m line components that 
lie between 0 cps and the frequency indicated on the abscissa. 
To locate the m components on the frequency scale, a tabular 
procedure was empolyed, the procedure being equivalent to 
dividing the ordinate scale into 2” equal intervals and projecting 
the upper limits of the odd-numbered ones over to the selected 
curve and then down to the abscissa scale. The dashed curve 
(Mel) is the subjective pee function [S. S. Stevens and J. Volk- 
mann, Am. J. Psychol. 53, 329-353 (1940) ], shown for comparison. 


C. R. LICKLIDER AND N. GUTTMAN 


density* functions shown in Fig. 2: ¥i(f)=c, uniform; 
W2(f)=e-*7, negative exponential; and W;(f), the 
‘importance function” defining the contributions to 
intelligibility of the various frequency bands of speech. 
The first line-density function corresponds to equal 
spacing in frequency, the second to exponentially 
increasing® intervals of frequency between components 
(i.e., equal spacing on a logarithmic frequency scale), 
and the third to something in between, but nearer to 
exponential than uniform. The third function was 
obtained (with the aid of smoothing and interpolation), 
from the band limits and midpoints of the 20 bands of 
equal contribution to intelligibility listed by Beranek.? 
The numbers of components in the various samples 
were: W,(f)—16, 32, 64, 128; W2(f)—16, 32, 64, 128; 
W3(f)—4, 8, 16, 32, 64, 128, 256. Random noise may be 
regarded, for practical purposes, as having n= ©. 

In each recorded sample, the components were of 
equal amplitudes, and the total power was fixed at a 


IMPORTANCE 
FUNCTION 


RELATIVE GAIN IN DB 


3 4 5678iggn 2 3 4 E009 
FREQUENCY IN CYCLES PER SECOND 

Fic. 3. Gain-versus-frequency curves for filters used to shape 
the spectra of the interferences: The power-transmission coeffi- 
cients (at frequency f) of the importance-function and speech- 
spectrum filters are approximately proportional to the importance 
for intelligibility of, and to the speech-power density in, an 
interval of frequency Af cps wide centered upon f. The solid 
curves are so located that random noises with the corresponding 
power-density spectra masked speech equivalently in Experiment 
1. The levels of the other curves are arbitrary. The curves of this 
graph are referred to as filter characteristics and as power distribu- 
tion functions in the text. 


constant value for all samples. Upon playback, the 
spectra were shaped by filters to produce the desired 
distributions of amplitude. Several different filters 
were used. Their frequency-response curves are shown 
in Fig. 3. 

Two separate series of listening tests were conducted. 
In the first series, Experiment 1, a streamlined form of 
intelligibility test was used. The aim was to cover the 
considerable number of test conditions rapidly, but 
with reasonable reliability, in order to determine the 
structure of the problem and to map out the domain to 
be measured in the second series, Experiment 2, with 
standard articulation tests. 


4Density or mutual proximity of the spectral lines, not the 
same as power density or spectral density. 

5 The spacing increases; the proximity of one component to the 
next decreases as we go up the frequency scale. 


MASKING OF SPEECH 


EXPERIMENT 1 
Apparatus and Procedure 


The setup for Experiment 1 is shown in Fig. 4. The 
amplitude-versus-frequency response of the over-all 
system, excluding shaping filters but including tape 
recorders and earphones, was uniform within +3 db 
from 100 to 6100 cps. The essential features of the 
streamlined method were the use of one listener at a 
time (two in all), the scoring of each response as it was 
made, and the adjustment of the speech level for the 
next test word on the basis of the correctness or in- 
correctness of the previous response—down a decibel 
for right, up a decibel for wrong. The listener repeated 
each word aloud to the experimenter as soon as she 
heard it and decided what it was. The experimenter 
compared the reported word with the correct word and 
operated the stepping control. The stepping attenuator 
was set on its middle step at the beginning of each test 
of 50 words, and the speech-to-noise ratio was adjusted 
to somewhere near the value for marginal intelligibility 
with the aid of the first attenuator (Fig. 4). By the time 
25 words had been presented, the stepping attenuator 
was “hunting” about the signal level corresponding to 
50% word articulation. The modal setting for the last 
25 words was taken as giving the critical signal level. 
Except in one instance mentioned later, each point 
plotted in the graphs of Experiment 1 is based on at 
least two tests with each of two listeners. 

In order to hold down the fluctuations that would be 
introduced by variations in level from one test word to 
the next, the test samples were re-recorded through a 
volume-compression circuit. This circuit had an attack 
time constant of 0.06 sec and a decay time constant of 
0.10 sec. It compressed 5 decibels into 1 decibel over the 
range of test-word levels. Inasmuch as the volume 
compression was used only in Experiment 1, and not in 
Experiment 2, the comparability of the results of the 
two experiments is evidence that the volume compres- 
sion was an acceptable artifice in this application. 

The speech-to-interference ratios were measured in 
a way that yields true power ratios of specifiable 
segments of the speech and noise. An electronic gating 
arrangement separated the key word—or, more ac- 
curately, the speech wave in an interval 0.40 sec long, 
immediately following the plosion in “‘write”—from the 
carrier sentence and passed the key-word wave form 
through a squaring circuit (part of Ballantine true 
root-mean-square voltmeter Model 320) into a volt- 
age integrator. It then passed a 0.40-sec segment of the 
interference through the same square-law circuit into a 
second, matched integrator. At the end of 50 words, 
the two squared-voltage integrals were read with a 
voltmeter, and their ratio was taken to be the ratio of 
speech power to interference power for the test. In 
Experiment 1, the foregoing procedure was used to 
calibrate secondary measurements made by reading 
speech and interference separately and directly from the 


289 


sTEprig eeconoen 


TAPE 
PLAYBACK 
TAPE 
PLAYBACK INTERFERENCE 
; SHAPING 
P 0 FILTER 
NOISE 
GENERATOR 


Fic. 4. Apparatus for Experiment 1: The speech wave is played 
back from the upper tape machine (Ampex 400A) through an 
attenuator that sets the general level of the signal and then 
through a second attenuator that is stepped up and down in 1-db 
steps by the experimenter. The interference comes from the noise 
generator (Grason-Stadler 455-A), through a shaping filter 
(frequency-response characteristics shown in Fig. 3). The speech 
and interference are added in a linear mixer and delivered through 
a Williamson-type power amplifier and Permoflux PDR-8 head- 
Phones, in Grason-Stadler earphone cushions, binaurally to the 
istener, 


STEPPING 
ATTENUATOR 


true root-mean-square voltmeter. Because both the 
compressed speech and the interference were quite 
homogeneous over time, the calibrated secondary 
measurements were essentially as good as the primary. 
In Experiment 2, with the speech not compressed, the 
square-law integration was used for each word, and for 
its paired segment of interference, of each test. In all 
the tests, the interference level in the band 200-6100 cps 
was approximately 6 db ve 1 microbar. 

In the case of random-noise interference, unless the 
spectrum is shaped by a filter that essentially limits the 
band to the span of frequencies important for masking 
speech, the over-all noise level is not very meaningful. 
High-frequency components contribute to the measure 
of noise power but not to the masking of speech, and 
this might make random noise appear to be, per watt, 
a poor masker. In the listening tests, it would have been 
better to have limited the random noise, as the line- 
spectrum interferences were limited, to the interval 
200-6100 cps. However, this was not done. To permit 
plotting the random-noise results in the same graphs as 
the results obtained with the line-spectrum inter- 
ferences, therefore, only the noise power in the interval 
200-6100 cps has been considered in determining the 
speech-to-interference ratios. Certainly the masking 
actually measured was quite the same as would have 
been measured had the noise been thus band-limited. 
The selection of 200 and 6100 cps as the limits of the 
band was dictated, of course, by the appearance of 
those limits in the importance function. 


Results 


The results of Experiment 1, then, are the values of 
the power signal-to-interference ratio at which approxi- 
mately 50% of the monosyllabic test words were 
reported correctly. These ratios, which have been 
converted into decibels, were obtained for the combina- 


290 


10 OB PER DECADE 


IMPORTANCE , 
UNIFORM 


- EXPONENTIAL, . 
UNIFORM 4 
f oe UNIFORM, 


UNIFORM 


SPEECH TO INTERFERENCE RATIO IN 08 


4 8 16 32 64 128 256 
NUMBER OF COMPONENTS 


Fic. 5. Masking of speech by uniform-amplitude line-spectrum 
interference: The curves show, in decibels, the ratio of over-all 
average speech power to over-all average interference power for 
50% word articulation versus the number of components in the 
interference. The two parameters are, specified first, the function 
of frequency governing the density (closeness of spacing in 
frequency) of the components and, specified second, the function 
of frequency governing the relative powers of the individual 
components. In this figure, the second function is the same for 
each of the curves—the components were all of equal amplitude. 
For the solid curve, the components were distributed in proportion 
to the importances of the various parts of the frequency scale for 
intelligibility, and there was one component in the center of each 
of m bands of equal contribution to intelligibility. For the long- 
dashed curve, the density of components declined exponentially 
with increasing frequency, and the components were spaced 
equally on a logarithmic frequency scale. For the short-dashed 
oe the density and spacing were uniform on a linear frequency 
scale. 


tions of number of components, line-density function, 
and power-distribution function shown in Table I. 

In Fig. 5, the speech-to-interference ratio required 
for 50% word articulation is plotted against the number 
n of components for several line-density functions. The 
power-distribution function is, in each case, uniform. 
Inasmuch as the best maskers are the ones for which the 
highest speech-to-interference ratios are required, it is 
evident that the negative-exponential and the impor- 


NIFORM 
IMPORTANCE 


——_ 


UNIFORM, 


REFERENCE CURVE UNIFORM 


cae f 
ra Py UNIFORM, 
7” SPEECH SPECTRUM 


UNIFORM, 
~ EXPONENTIAL! 


SPEECH -TO- INTERFERENCE RATIO IN 0B 


4 8 16 32 64 128 e256 
NUMBER OF COMPONENTS 


Fic. 6. Masking of speech by uniform-density line-spectrum 
interference: The first parameter is held constant here, the 
second varied (see Fig. 5). The several distributions of power 
among the components were produced by passing equal-amplitude, 
uniform-density interference through filters shown in Fig. 3. The 
Reference Curve (importance-function line density, uniform 
power distribution) is repeated from Fig. 5 for comparison. 


J. Cc. R. LICKLIDER AND N. GUTTMAN. 


tance-function line-density functions yield more effec- 
tive maskers than the linear line-density function. 
Masking effectiveness increases with number of compo- 
nents, total power being held constant. In the upper- 
most curve, the increase is about 10 db per decade up 
to 32 components, but quite gradual beyond n= 64. 
The effect of varying the power-distribution function 
is illustrated in Fig. 6. For each curve other than the 
reference curve, the lines were equally spaced along the 
frequency scale. Evidently, uniform power distribution 
among equally spaced lines is not effective. The best 
distribution of power when there are few components 
appears to be that governed by the negative-exponen- 
tial function. When there are many components, the 
importance function is best. If we represent the distribu- 
tion of power in frequency by a much-smoothed power 
spectrum in which one strong line is equivalent to 
several weak ones (of same total power) in the same 
neighborhood, then the line-density and power-distribu- 
tion functions may be substituted one for the other 
without altering the picture. Evidently that is not the 


TABLE I. Combinations tested in Experiment 1. The entries in 
the table are values of ”, the number of sinusoidal components in 
the interference. All integral powers of 2 between the values of x 
shown were tested. 


Line-density function 


Wilf) W2(f) W3(f) 

Power-distribution function Linear —Exponential ‘‘Importance”’ 
@.(f) Uniform 16-128 16-128 4-256 
@2(f) Inverse to f 16-128 
©3:(f) Importance function 16-128 
@,(f) Speech spectrum 16-128 
@;(f) Proportional to f 16-128 
@«(f) Inverse importance 16-128 | ! 


proper representation for aural perception, for the 
“Uniform, Importance” curve of Fig. 6 is clearly 
different from the “Importance, Uniform’’ reference 
curve. Yet there is a possibly significant parallel be- 
tween Figs. 5 and 6 in the fact that, in both, the 
negative-exponential function is best for low values of 
n, the importance function for high values of x. This 
result is consistent with the facts (a) that the masking 
produced by intense sinusoids spreads up the frequency 
scale more than down and (b) that the individual 
sinusoids in the negative-exponential interference are 
lower in frequency than their mates in the importance- 
function interference. On the whole, the power-dis- 
tribution function does not appear here to make as 
much difference as the line-density function. The four 
“Uniform” curves of Fig. 6 are rather closely bunched 
together. 

If we select the “Importance” and ‘‘— Exponential” 
density functions as the most promising at this stage, 
the question arises, what is the best power-distribution 
function to pair with them? In Fig. 7, the possibility is 
investigated that high-frequency components should be 


MASKING OF SPEECH 


emphasized enough to make up for their relatively wide 
spacing. The importance-function density is com- 
pensated for by an inverse-importance-function filter, 
and the negative-exponential density by a positive- 
exponential filter (see Fig. 3). But the compensations 
reduce masking effectiveness. 

Seeing that emphasis of the highs is bad, we turn to 
attenuation of the highs. (We focus now upon the 
importance function as the line-density function for 
further study, thus cutting down the number of tests 
required.) Figure 8 shows the effect of the importance- 
function and the speech-spectrum filters. These tests, 
unlike the others, were made with only one subject, but 
her scores were not systematically higher or lower, in 
the other tests, than those of the other subject. It is 
clear that the relative attenuations of the high-fre- 
quency components (and boosts of the low-) decreased 


‘ 
a 


IMPORTANCE,UNIFORM 


- EXPONENTIAL, UNIFORM 


ple 
o 


IMPORTANCE, 
INVERSE 
IMPORTANCE 


- EXPONENTIAL, 
+EXPONENTIAL 


‘ 
on 


SPEECH -TO-INTERFERENCE RATIO IN 08 
9 
°o 


4 8 16 32 64 128 256 
NUMBER OF COMPONENTS 


Fic. 7. Masking of speech by uniform-amplitude and by 
“reciprocal” line-spectrum interference: The upper curves are for 
uniform-amplitude interferences in which the densities of the 
lines are governed by the importance function and by the negative 
exponential function, respectively. For the lower curves, the 
functions governing the relative powers of the individual compo- 
nents are the reciprocals of the functions governing the densities 
of the components along the frequency scale. Consequently, after 
smoothing, the power spectra of the latter two interferences are 
uniform, 


masking effectiveness. Except for the one wild datum 
point, it appears that the severe shaping produced by 
the speech-spectrum filter is worse than the mild 
shaping produced by the importance-function filter. 
From the greater separation of the curves in Fig. 8 
than in Fig. 6 (Reference Curve excluded), we may 
judge that when we get nearly the optimal spacing of 
the components, the power distribution becomes more 
important. And, from the data presented thus far, we 
conclude that line density controlled by the importance 
function, with uniform distribution of power among the 
lines, offers the most effective line-spectrum masker. 
In the foregoing discussion, it was tacitly assumed 
that no pre-emphasis would be used in the speech 
channel with which the interference is designed to 
interfere. But what if the speech wave is passed before 
transmission through a filter that markedly alters the 
speech spectrum? To explore that question, we con- 


291 


“10 


IMPORTANCE , 
UNIFORM 


A---—-9 
-” 


’ 
a 


” 
“7 SaurontAnee - 


SPEECH SPECTRUM 


' 
by] 
oO 


IMPORTANCE , IMPORTANCE 


SPEECH -TO- INTERFERENCE RATIO INDB 


' 
i) 
uo 


4 8 16 32 64 128 256 
NUMBER OF COMPONENTS 


Fic. 8. Masking of speech by line-spectrum interferences with 
falling power spectra: For each curve, the importance function 
governs the density of the components. For the middle and lower 
curves, the functions governing the distributions of power among 
the components fall with increasing frequency. These inter- 
ferences are less effective in masking speech than the interference 
(solid curve) specified by importance-function density and 
uniform power-per-component. Thus both “rising shaping” 
(Fig. 7) and “falling shaping” (here) impair masking effective- 
ness, and the uniform-amplitude interference appears to be near 
the optimum. 


ducted tests with ‘‘white” speech, produced by in- 
serting (between the speech playback and the first 
attentuator in Fig. 4) a filter with the gain-versus- 
frequency curve labeled ‘Inverse Speech Spectrum” 
in Fig. 3. The results are shown, relative to the “Impor- 
tance, Uniform” curve for unfiltered speech (Reference 
Curve), in Fig. 9. Clearly, ‘“Importance, Uniform” and 


IMPORTANCE , 
UNIFORM 


nee : 
UNIFORM 


UNIFORM, 
UNIFORM es 


UNIFORM , 
SPEECH 
SPECTRUM 


7 


REFERENCE CURVE 


SPEECH -TO; INTERFERENCE RATIO IN 0B 


4 8 16 32 64 128 256 
NUMBER OF COMPONENTS 


Fic. 9. Masking of “white” speech by line-spectrum interfer- 
ence: The top four curves are for speech that has been pre- 
emphasized, by the filter labeled Inverse Speech Spectrum B in 
Fig. 3, to produce an approximately white or uniform power- 
density spectrum. The reference curve is for unfiltered speech 
and “Importance, Uniform” interference. Note that, for white 
speech as for unfiltered speech, the ‘Importance, Uniform” and 
‘“‘__ Exponential, Uniform” interferences are the most effective 
ones tested. 


292 J. 


“‘— Exponential, Uniform” are still the best maskers. 
However, the white speech has to be much stronger 
than the unfiltered speech to be understood. In part, 
that is due to the combination of volume compression 
and “whitening,” but “whitening” alone reduces the 
inherent intelligibility of speech, as shown previously® 
and as checked by us in informal listening tests. We 
conclude, therefore, that the interference chosen as 
optimal will be at least near-optimal despite tolerable 
shaping of the speech spectrum. 

The final question concerning Experiment 1 is: how 
closely does line-spectrum interference with 128 or 256 
components approximate random noise? Subjectively, 
“Importance, Uniform” interference with 128 compo- 
nents has only a trace of tonality, and with 256 compo- 
nents it sounds almost (the two are just distinguishable 
in A—B comparison) like random noise shaped by the 
importance-function filter. It is probably significant that 
“Uniform, Uniform” interference is still quite tonal 
with 128 components, and that the “Importance, 
Uniform” interference approaches subjective equiva- 
lence to random noise faster than any of the others. 
However, the subjective impression is not as definitive 
a criterion for our purpose as is the intelligibility 
test. The best line-spectrum interferences fell somewhat 
short of shaped random noise in the intelligibility tests. 
The results are shown in Table II. Evidently the 128- 
line “Importance, Uniform” interference was 2 or 3 db 
less effective in masking than random noise, and up- 
setting its balance by passing it through a filter sent it 
far behind similarly-filtered random noise. 


EXPERIMENT 2 
Apparatus and Procedure 


The regular articulation tests were conducted with 
the equipment arranged as in Fig. 4 except for: (1) 
elimination of the stepping attenuator, (2) use of re- 
corded interference throughout, and (3) addition of 


TABLE ITI. Comparison between 128-line interference and con- 
tinuous random noise as maskers of speech. Each entry is a speech- 
to-interference ratio, in decibels, required for 50% word articula- 
tion. The spacing of the lines in the line-spectrum interference was 
governed by the importance function. The distribution of power 
among the lines was governed by the filters. The power of the 
continuous-spectrum interference was measured in the band 
200-6100 cps. 


Unfiltered speech White speech 


128. Contin- 128- Contin- 
Filter Line uous Line uous 
Uniform —5.7 —3.3 3.0 3.0 
Importance function —10.0 —3.8 see 5.0 
Negative exponential ee —3.0 4.5 
Speech spectrum —13.8 . 2.2 


8 N. B. Gross and J. C. R. Licklider, “The effects of tilting and 
clipping upon the intelligibility of speech,” Report PNR-11, 
Psycho-Acoustic Laboratory, Harvard University, Cambridge, 
Massachusetts (1946). 


C. R. LICKLIDER AND N. GUTTMAN 


headsets. Nominal speech and interference levels were 
set at the beginning of each test by manual adjustment 
of the attenuators, but the actual speech-to-interference 
ratio against which the scores were later plotted was 
determined during the test with the aid of the measuring 
apparatus already described. Inasmuch as the power 
amplifier had very low output impedance and plenty of 
power capability, the addition of headsets did not alter 
the signals. The members of the listening crew were 
male M.I.T. students, paid for their services without 
reference to their scores but, in compensation for 
zealous effort, at the prevailing rate for skilled workers. 

Two tests were made at each of six signal-to-inter- 
ference ratios with each of 11 interferences. The test 
sequence consisted of two blocks, random order within 
each block. 


rep) re) ro) 
e) oO oO 


PER CENT WORD ARTICULATION 
mS 
oO 


-20 -10 O 10 
SPEECH -TO-INTERFERENCE RATIO IN DB 


Fic. 10. Representative individual listener scores of articulation 
tests: To provide an idea of the spread of the raw data, these three 
curves are shown in relation to the individual datum points for 
two tests, six listeners per test. (These curves were selected with- 
out prior examination of the data. The other data have about 
equal spread.) A large part of the variance is due to consistent 
differences among the listeners. 


Results 


First, in order to illustrate the spread of the data, 
three representative curves of percent word articulation 
versus speech-to-interference ratio are shown, with 
datum points for individual listeners, in Fig. 10. A large 
part of the variability is due to individual differences 
among the listeners, some to differences between first 
and second testings. Over all, the vertical spread of 
the scores is about twice as great as would be expected 
if all the conditions had been absolutely homogeneous, 
each word precisely like every other word. The problem, 
with such data, is to pass reasonable curves through the 
arrays of points, taking into account the fact that 
neighboring curves should follow roughly parallel 
courses, but that a progressive change of shape from 
one end of the family to the other is to be expected. 
Theory in this area is not well enough developed to 
provide a mathematical form for the functions. The 
curve-fitting was therefore done by eye. 

Curves for the tests with ‘Importance, Uniform” 


MASKING OF SPEECH 


line-spectrum interference are shown in Fig. 11, solid 
lines. The differences in masking effectiveness depend 
upon the criterion of masking that is selected. There is 
less horizontal spread at the 90% articulation level than 
there is at the 50% or the 10% level. 

None of the line-spectrum interferences was as 
effective in masking speech, however, as the continuous- 
spectrum noises (dashed curves in Fig. 11). For a given 
amount of interference power in the band 200-6100 cps, 
the noises were better by 2 or 3 db than the 256-compo- 
nent line-spectrum interference. The result is in line with 
the corresponding finding (Table II) of Experiment 1. 
Because of the spread of the data in Experiment 2, we 
would not conclude from it alone that the three noises 
(Fig. 11) are significantly different from one another, 
but the fact that they were set in the same order of 
masking effectiveness by Experiment 1 lends support 
for such a conclusion. However, the more important 


100 
fo} 
= 
a 
5 80 
© |IMPORTANCE ,UNIFORM 
5 Tor cotiMeEvrs / 
OF | 128 / 
a Bee ee 17 RANDOM NOISE 
S 40 64 25677 
. jp IMPORTANCE 
= 20 -~t— UNIFORM 
oO 
fh 
oa -20 10 0 10 


SPEECH - TO - INTERFERENCE RATIO IN DB 


Fic. 11. Masking of unfiltered speech by line-spectrum inter- 
ference (solid curves) and by continuous-spectrum interference 
(dashed curves) in regular articulation tests: Percent word 
articulation versus speech-to-interference ratio for various num- 
bers of components. The density of spacing of the line components 
was governed by the importance function, and the lines were 
uniform in amplitude. The continuous spectra were shaped by 
filters with amplitude-versus-frequency characteristics shown in 
Fig. 3. 


point is that shaping the masking spectrum differently 
makes no greater difference than it does in masking 
effectiveness. 

The results obtained in the two series of line-spectrum 
tests, streamlined and regular, are compared in Fig. 12. 
The solid curves are equal-articulation contours from 
Experiment 2. The dashed curve is the 50% articulation 
curve from Experiment 1. The principal differences are 
that the regular test scores are a bit higher and that in 
shape the curve from the streamlined tests is more 
closely similar to the 5 and 10% contours than to the 
50% contour from the regular tests. Unfortunately, the 
data are not statistically stable enough to warrant a 
search for a possible rationale for the shift in the 
“break.” 


DISCUSSION 
Comparison with Other Data 


We have found no other experimental results that can 
be compared at all directly with the present results for 


293 


a“ 
a 
a sa 10 
ee 5 PERCENT 
WORD ARTICULATION 
IMPORTANCE , UNIFORM 
4 8 6 32. 64 128 
NUMBER OF COMPONENTS 


SPEECH -TO- INTERFERENCE RATIO IN 0B 


256 


Fic. 12. Equal-intelligibility curves based on Fig. 11: The 
dashed curve, presented for comparison, is the corresponding 
curve for 50% word articulation from the streamlined tests. 


line-spectrum interference. It may be worthwhile, 
nevertheless, to contrast the relation between masking 
effectiveness and number of components found here 
with the relation determined in tests’? with periodic 
impulsive interference. (The latter tests were made with 
a laboratory amplitude-modulation radio link with a 
conventional superheterodyne receiver, the interference 
being introduced at radio-frequency. The complicating 
factor of intermodulation must therefore be kept in 
mind. However, unpublished results indicate that the 
following observations hold equally well for audio- 
masking.) The essential difference between periodic 
impulsive interference and the line-spectrum inter- 
ference designated ‘Uniform, Uniform” here is that the 
former has a highly organized “all cosine” phase 
pattern, corresponding to a wave form of pulses 
separated by gaps, whereas the latter has an un- 
organized, haphazard pattern of relative phase, corre- 
sponding to a wave form that oscillates continuously in 
time. In the former, if the number » of frequency 
components in the speech band is great, the temporal 
gaps are wide, and speech may be heard during the 
gaps. As a result, as m increases from 1 to 10 or 20, 
masking effectiveness increases, but it then levels off and 
finally, as m proceeds upwards from 20 or 30, it decreases 
again. With all the line-spectrum interferences employed 
in the present tests, masking effectiveness increases 
monotonically with m. Obviously, one should avoid 
impulsive waveforms if he needs effective maskers. 
When it comes to continuous-spectrum interference, 
there are results comparable to those reported here for 
uniform-spectrum random noise. Four curves relating 
percent word articulation to speech-to-noise ratio are 
shown in Fig. 13. These are presented here primarily to 
show that, whereas there is good agreement about the 


TJ. C. R. Licklider and S. J. Goffard, J. Acoust. Soc. Am. 19, 
653-663 (1947). 


294 


is 

& 

3 80 

- LICKLIDER +7 ve 

& 60 y AX 

a NEE 
[tag 

g 40 EGAN 
5 

i 20 

ae 

2 


1O 


20 20 


-10 Oo 0 
SPEECH -TO-NOISE RATIO IN DB 


Fre. 13. Four different curves relating percent word articulation 
to speech-to-noise ratio: In each case, the speech material was 
the Harvard PB lists, and the noise was random noise of uniform 
spectrum over the pass band of the test system. The right-hand 
curves have been shifted 1.8 db to the left to take into account 
the fact that the band width within which the noise was measured 
was 4000 cps instead of 6000 cps. The dashed and dotted curves 
are from the following references: Egan, Miller, Stein, Thompson, 
and Waterman, “Studies on the effect of noise on speech com- 
munication,” OSRD Report No. 2038, Psycho-Acoustic Labo- 
ratory, Harvard University (1943) or J. P. Egan, Laryngoscope 
58, 1-39 (1948); G. A. Miller, Psychol. Bull. 44, 115-129 (1947); 
J. C. R. Licklider, J. Acoust. Soc. Am. 20, 150-159 (1948). 


speech-to-noise ratios at which speech starts to be 
heard and becomes 10 or 20% intelligible, the curves 
rise with markedly different slopes and to somewhat 
different levels. A plausible explanation of the dif- 
ferences among the curves may lie in differences (1) 
in band width of the communication systems and (2) in 
degree of homogeneity of the actual speech samples 
used in the tests. For the left-hand pair of curves, the 
upper cut-off frequency was about 6500 cps; for the 
right-hand pair, it was about 4000 cps. No quantitative 
comparison can be made of the degrees of homogeneity, 
but it is well known that more homogeneous test 
material yields steeper slopes. Until more definitive data 
are available, it will be necessary to admit an uncer- 
tainty of at least 5 db in specifying absolutely the 
speech-to-noise ratio at which speech reaches, for 
example, 75% monosyllabic word intelligibility. Almost 
surely, a concerted effort to isolate and quantify the 
important parameters would improve the absolute 
accuracy considerably. 


Theory of Speech and Intelligibility 


The principal theory of speech intelligibility is the 
one that is implicit in the methods of calculating 
intelligibility developed by French and Steinberg,’ 
Fletcher and Galt,? Beranek,? Collard,® Pocock,’ and 
Strasberg” and closely related to Fletcher’s! concept of 
critical bands in the auditory mechanism. It is the pur- 
pose of the following paragraphs to examine certain 
aspects of that theory in relation to the present results. 

Briefly, the theory postulates that a basic index of 


8J. Collard, Elec. Commun. 7, 168-186 (1929); J. Collard, 
Elec. Commun. 8, 141-163 (1930) ; J. Collard, Elec. Commun. 11, 
226-233 (1930). 

9L. C. Pocock, Elec. Commun. 18, 120-132 (1939). 

10M. Strasberg, Report No. 371-N-12, Bureau of Ships, U. S. 
Navy (1952). 


J. C. R. LICKLIDER AND N.’ GUTTMAN 


intelligibility (“Articulation Index’’), to which various 
measures such as percent word articulation are related 
by definite, single-valued, monotonic functions, is 
proportional to the area covered by speech in a two- 
dimensional plot. The scales corresponding to the two 
dimensions are distorted scales of speech-to-noise ratio 
(ordinate) and frequency (abscissa). The distortion of 
the former scale is logarithmic. The distortion of the 
latter is governed by the importance function: bands 
that contribute equally to intelligibility are equally 
wide on the distorted frequency scale. The area to which 
articulation index is proportional is an area bounded as 
follows: (a) on the top by the “threshold of feeling” 
or by a curve 12 db above the long-time-average 
speech power-density spectrum, whichever is lower, and 
(b) on the bottom by the “threshold of hearing,” or by 
a curve 18 db below the speech spectrum, or by the 
power-density spectrum of the interference, whichever 
is higher. In the case of line-spectrum interference, the 
usual practice is to determine an equivalent power- 
density spectrum by distributing the power of each line 
over a band of frequencies. 

In the formulations developed for the purpose of 
predicting intelligibility from physical parameters, as 
a simplification to facilitate computation, a definite 
number of contiguous bands is substituted for the near- 
infinitude of overlapping bands conceived of in the basic 
theory. In relating the theory to the present results, it is 
important to keep in mind that there is a critical band 
centered at each point along the frequency scale. There 
is therefore no question of adventitiously hitting or 
missing the centers of the bands. Wherever a component 
is, it is at the center of some band and more or less off 
center in many others. 

The first notion of the theory just sketched that we 
should examine in the light of the present data is the 
notion that the pass bands of the conceptual filters that 
divide up the frequency dimension can satisfactorily be 
regarded as rectangular. To regard them so is to adopt 
the fundamental notion of the first-approximation 
critical-band theory as stated by Fletcher!: that only 
the interference within a rectangular, critical band of 
frequencies masks the signal within the band. This was 
adopted deliberately as a simplifying assumption, 
known not to be correct in detail. Bilger and Hirsh! and 
others have shown that it does not hold in fine approxi- 
mation for masking of sinusoids by bands of noise. The 
question is, is the rectangular assumption adequate for 
our present data. The answer is “evidently no,” for 
interference consisting of four sinusoidal components 
should, according to the rectangular formulation, mask 
at most four critical bands of speech. The total number 
of bands certainly cannot be as low as four, for if it were 
there would not be such great effect of increasing x. 
Therefore, the assumption of remote masking" is 


11 Fletcher,! French and Steinberg,? and others have appealed 
to “remote masking” to explain the masking of high-frequency 
tones by low-frequency tones, apparently implying that the 


MASKING OF SPEECH 


required to explain how four or eight sinusoids can 
reduce speech intelligibility to zero, even when their 
levels are extremely high. A promising way to work 
remote masking into the theory is to assume that the 
myriad, overlapping “critical-band filters’? have wide 
skirts, that their characteristics are asymmetrical 
(skewed toward the low end of the frequency scale), and 
that the slope of the skirts varies with the level of the 
signal or that there is nonlinear distortion at a point 
preceding the filters. 

(We may pass by with only a brief note the question, 
whether computational procedures based upon division 
of the frequency scale into as few as three or four bands 
could predict our results successfully. Since these 
procedures are sensitive only to the total interference 
power in a band, and not to how it is divided up, they 
would not distinguish between many and few compo- 
nents. They would not be adequate at all.) 

Next, we may focus attention on the fact, displayed 
in Fig. 12, that for small ” the slope of the curve 
relating speech-to-interference ratio (for specified level 
of intelligibility) to rises about 10 db per decade. 
When we double the number of components, we halve 
the required over-all interference power. (Let us sup- 
pose, now, that we hold the speech power constant and 
vary the interference power. Over a fairly wide range of 
level, that would lead to results essentially identical to 
those obtained with the reverse procedure.) Since, after 
doubling ”, there are 2” components with a total of 
half the power, each individual component is only one- 
quarter as intense as before. 

This second observation bears on the question, what 
is the metric of the scale of speech-to-interference ratio? 
Is the logarithmic scale of the standard theory ade- 
quater If we persist in assuming rectangular bands, it is 
evident that the answer to the latter question is nega- 
tive. The trading relation’ between the ratio S/N of 
speech power to interference power and frequency 
band width Af cannot be, as it is given in the standard 
theory, 


log(S/N)-Im(Af)=constant. (4) 
Rather, it must be something like 
(S/V)?-Im(Af)=constant. (5) 


That is most easily seen with the aid of the unnecessary 
assumption that, when is small, there is not more than 
one component in any critical band. If the components 
produce their effects in separate and independent bands, 


process is different from that of local masking. Certainly it appears 
to follow a different growth curve. To avoid the difficulty in- 
troduced by remote masking in the calculation of loudness, 
Fletcher and Munson! restricted their procedure to relatively 
uniform or smoothly sloping spectra. Such a restriction has not 
been made, or at least has not always been followed, in computa- 
tions of intelligibility. 

#2 We shall use the symbol Im for the importance function 
shown in Fig. 2, and we shall, for the sake of simplicity, let 


Im(4f) be Im(fs)—Im(f1), Af=fa—hh- 


295 


the total band width covered is proportional to ”, and 
the speech-to-interference ratio in each band is propor- 
tional to “*. The fact that the square root of 
multiplied by is constant checks Eq. (5). However, 
(S/N), which is the rms voltage ratio of speech to 
interference, is not at once appealing as a substitute for 
log (S/N). 

At first thought, it may appear possible to salvage the 
relation of Eq. (4) by replacing the rectangular fre- 
quency bands with appropriate band-pass filter char- 
acteristics. The problem is (still) to make the fraction 
of the articulation area that is covered by masking 
remain constant when the number of components is 
multiplied by 2 and the individual powers of the 
components are divided by n?. 

If we focus attention on a single “critical-band 
filter” in the auditory system, one centered on a 
component already present, we see that the addition of 
n—1 weakened outlying components must make up 
for the weakening by the factor ~? of the central one. 
But this is impossible, for—since the filter has a band- 
pass characteristic—the fact that the new components 
are outlying means they have less weight than the one 
in the center. Equivalently, we may look at the whole 
area at once: Each doubling of » doubles the number 
of patches of masking. They partly overlap, and they 
reinforce one another where they overlap. But each 
patch is reduced in size when the number is increased. 
The total area covered can remain constant, therefore, 
only if either (a) the interaction in the regions of overlap 
corresponds to a highly nonlinear law of summation, or 
(b) the ordinate scale itself is linear in (S/N)? as 
required by Eq. (4). Alternative (a), though not taken 
into account in the existing theory, may play a role. 
However, it should not be relied upon to explain the 
entire phenomenon we have been discussing. 

The third point at which the present data depart from 
the standard theory has to do with matching of the 
interference spectrum to the speech spectrum. Alternate 
forms of the theory predict that the most effective 
masker will be (1) a noise shaped by a “speech- 
spectrum” filter and (2) a noise shaped in such a way as 
to produce a ‘“‘masked threshold” curve of the same 
shape as the speech spectrum. In our tests, the best 
maskers had much more uniform spectra than either of 
those. Several qualifications are necessary here. First, 
the volume compression used in Experiment 1 does 
boost the high-frequency end of the spectrum. Second, 
the single talker of our tests has a higher-than-average, 
but not at all unusual, ratio of consonant power to 
vowel power. This also boosts the high-frequency end 
of the spectrum, but we have available the spectra of 
speech samples recorded by him in the same anechoic 
chamber under similar circumstances, and they are 
rather typical. Our tentative conclusion is that an effec- 
tive speech masker must produce enough masking in 
the high-frequency bands to mask speech components 


296 J. 


there when they are present. When they are present, 
they are rather strong. Because they are not present 
much of the time, their average power is low. Therefore, 
the interference spectrum must not fall off with increas- 
_ ing frequency as much as the speech spectrum does. 

The final feature of the present results that we should 
examine in relation to the theory is the difference in 
masking effectiveness between a line-spectrum inter- 
ference with 256 components and a random noise. This 
difference is only 2 or 3 db, but it appears to be genuine 
and to need explanation. We suspect that the essential 
regularity and predictability of the line-spectrum inter- 
ference, even one with many components, must be 
taken into account in a complete theory. 

It is conceivable, of course, that the critical bands of 
the auditory system are so narrow that some con- 
tribution to intelligibility may be derived from the 
speech components between the lines. With 256 compo- 
nents, however, the lines are separated by only about 
12 cps in the part of the frequency scale where 
Fletcher’s! critical bands are 50 or 60 cps wide. That 
gives us four or five lines per band in that part of the 
scale. The number per band is even greater where the 
bands are wider. This makes the difference appear 


Cc. R. LICKLIDER AND N. GUTTMAN 


likely to be an essential difference between random and 
determinate interferences. 

We recognize that the now-standard theory of 
intelligibility rests on a rather firm base of experiment 
and measurement and that it has been confirmed to 
greater or less degree by a considerable number of 
empirical tests. We appreciate, also, that the present 
body of results is by no means definitive, being derived, 
as it is, from the speech material of a single talker and 
from a limited series of listening tests. Nevertheless, the 
present results provide a much more stringent test of 
certain basic features of the theory than has been 
provided heretofore. The theory does not meet these 
tests well. We do not consider that the foregoing 
discussion constitutes a refutation of the theory, or that 
it is now time to offer a substitute theory that meets the 
present tests. Rather, our conclusion is that there are 
several aspects of the theory which should be subjected 
to further and closer scrutiny. In particular, the metric 
of the ordinate scale, the curve chosen to represent the 
speech spectrum in computations of intelligibility, and 
the shape of the critical-band filters (or, alternatively, 
the rules governing remote masking) should be ex- 
amined carefully. 


THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 


VOLUME 29, NUMBER 2 FEBRUARY, 1957 


On the Relation between the Intelligibility and Frequency 
of Occurrence of English Words* 


Davis Howest 
Operational A pplications Laboratory, Air Force Cambridge Research Center, Bolling Air Force Base, Washington, D. C. 


(Received October 16, 1956) 


The threshold of intelligibility for a word in a wide-spectrum noise is shown to be a decreasing function 
of the frequency with which the word occurs in general linguistic usage (word frequency). The drop in 
threshold is about 4.5 db per logarithmic unit of word frequency. This rate is independent of the length of 
the word, although the thresholds for words of given frequency of occurrence are lower for long words. 

The effect of restricting the listener’s alternatives in an intelligibility test to a specified number of words 
is calculated from this relationship. These calculations come within 1 db of published experimental data. 
Theoretical functions relating intelligibility threshold to word length are also calculated from the word- 
frequency effect, on the assumption that listeners can discriminate the length of a word at levels too low for 
it to be identified. These functions are in general agreement with the experimental results. 

Implications for intelligibility testing procedures are discussed. 


i Bar frequency of occurrence of a word in general 
linguistic usage has been shown repeatedly to 
have a large effect upon the duration for which that 
word must be presented visually in order for an observer 
to identify it. A similar result is found if nonsense 


* This is AFCRC TR 56-2, ASTIA Document No. AD 98830. 

{Present address: Department of Economics and Social 
Science, MIT, Cambridge 39, Massachusetts. 

1D. Howes and R. L. Solomon, J. Exptl. Psychol. 41, 401-410 
(1951). 

2 McGinnies, Comer, and Lacey, J. Exptl. Psychol. 44, 65-69 
(1952). 

3 J. J. DeLucia and R. Stagner, J. Personal. 22, 299-309 (1954). 

4], Postman and B. Conger, Science 119, 671-673 (1954), 


words are used and the frequency of occurrence is 
replaced by the frequency with which the observer 
repeats the nonsense word in a preliminary training 
phase of the experiment.®-® For the latter type of ex- 
periment the effect of word frequency applies also to 
the luminance threshold of a word when its duration 
is held constant.®.” 


( 5 R. L. Solomon and L. Postman, J. Exptl. Psychol. 43, 195-201 
1952). 

6 P. King-Ellison and J. J. Jenkins, Am. J. Psychol. 67, 700- 
703 (1954). 

7K. E. Baker and H. Feldman, Am. J. Psychol. 69, 278-280 
1956), 


